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The Prediction of Effectiveness as a Factory Foreman’ 


Martin M. Bruce? 
Dunlap and Associates, Inc., New York, N.Y. 


I. INTRODUCTION 


HE LITERATURE is replete with ar- 
‘Tite on the subject of supervisors 
in industry. The importance of their po- 
sition has been cited by Mandell (18) 


and Lawshe (14), among others. However, 


there is a paucity of psychological re- 
search on foremen specifically (1, 10). 


II. THE PROBLEM 


‘The primary purpose of this study was 
to determine the extent and nature of 
importance of certain skills, abilities, and 
personality characteristics in relation to 
effectiveness as a factory foreman. The 
five specific problems were: 

1. Is the rating which served as the 
criterion multiple or simple? 

2. If the criterion is multiple, what are 
the optimal weights assignable to the 
various factors? If the criterion is simple, 
what part or parts or combination will 


serve as the best standard? 

3. What is the nature of the inter- 
relationships among certain personality 
characteristics, skills, and abilities, on 
the one hand, and measures of effective- 
ness as a factory foreman on the other? 

4. How adequately can effectiveness as 
a factory foreman be predicted by means 
of the battery of tests chosen? 

5. What are the optimal weights as- 
signable to each of the predictors? 


Ill. METHOD 


The Sample 


The subjects in this study were 107 
foremen in a major tobacco firm in the 


*This monograph is based on a thesis sub- 
mitted in partial fulfillment of the require- 
ments for the degree of Doctor of Philosophy at 
New York University (5). 

? The author is indebted to Professor John J. 
Sullivan, Chairman of the Doctoral Committee, 
who aided in the formulation of the problem 
and saw the research to its completion. To Miss 
Ida Moore I am grateful for aid in the computa- 
tions. Mr. Corlin O. Beum gave many helpful 
suggestions in connection with statistical pro- 
cedures, Finally, my deep appreciation to my 
wife, without whose personal sacrifices this work 
could not have been completed. 


United States. This group constituted a 
100% sample of the personnel who bore 
the title of foreman at the time of test- 
ing. The men were located in four sepa- 
rate buildings in a Southern city. Their 
ages ranged from 20 to 63, with a mean 
of 39.1 and a sigma of 8.99. All were em- 
ployed by the company for at least one 
year and each was in the job of foreman 
for a minimum of six months. The maxi- 
mum length of service was 17 years and 
the mean for the group was 10.1 years. 
The foremen who constituted this 
sample were nonunionized. 
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Each foreman supervises a group of approxi- 
mately 15 workmen engaged in particular phases 
of the company’s operations involving the prepa- 
ration of tobacco leaf into pipe or smoking to- 
bacco, The men working under the direction of 
the foreman operate stemming, shredding, mois- 
tening, flavoring, sealing, stamping, and packing 
machines. The foreman interprets written and 
verbal orders and determines the procedure of 
work for his group. He receives instructions from 
the plant superintendent, assistant plant super- 
intendent, and chief engineer. 

He assigns duties to men and women working 
under his direction and inspects their work for 
quality and quantity. He assigns workers to ma- 
chines and personally trains them in machine 
operation when necessary. The foreman keeps 
records of output for his department which he 
submits to the plant superintendent at the end 
of the work day. He is responsible for maintain- 
ing harmony among his group of workers and 
is responsible for their morale and health on the 
job. He institutes and carries out plant accident 
prevention programs by teaching safety through 
demonstrations and by continual observation for 
lack of adherence to the rules. 

The foreman helps train subordinates, usually 
titled assistant foremen, and aids them during 
emergencies. He performs related duties of a 
supervisory and administrative nature. He occa- 
sionally operates machinery in order to maintain 
level of production in the absence of workers. 


The Criterion Variables 


The criteria employed in this study 
are ratings by management. These rat- 
ings are completed annually by the plant 
superintendent and personnel director 
of the factory. The assistant plant super- 
intendent also completed the ratings on 
the foremen, but only the over-all score 
on this rating form was available in the 
company records. Areas covered by the 
rating include: leadership, knowledge, 
dependability and judgment, initiative 
and creativeness. The scale ranges from 
75°) to 100%, with the following descrip- 
tive adjectives offered on the form: 100°, 
exceptional; 959%, above average; go%,, 
average; 85%, below average; 75%, fair. 

“Leadership” is divided into two sub- 
divisions entitled ‘Personality Traits” 
and “Organization.” These contain ten 
and four items respectively. “Knowledge” 


, / 
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contains three items; “Dependability and 
Judgment” seven items; and “Initiative 
and Creativeness” eight items. 

Scores for each of the four areas as 
well as a total score were employed as 
criterion variables. The score for a par- 
ticular area, eg., Knowledge, is the 
sum of the individual items in the sec- 
tion. The total rating is the mean score 
for the four parts. These criterion meas- 
ures, in the above form, were available 
from management. One other measure 
was attainable in the company records. 
This was in the form of a composite score 
of the total ratings of the plant superin- 
tendent, personnel manager, and as- 
sistant plant superintendent. 


The Predictor Variables 


The predictors included age, years with 
the firm, education in number of years, 
and the following tests: 


The Personality Inventory. This test has been 
widely used since its publication by the author, 
Robert G. Bernreuter, in 1931. By 1946 there 
were some 135 published studies pertaining to 
this instrument. The inventory can be scored 
for six factors: neurotic tendency, self-sufficiency, 
introversion, dominance, self-confidence, and so- 
ciability. In this study two of the six scores were 
not used, self-sufficiency and introversion. Split- 
half reliabilities reported by the author are 
based on student populations, They are: neu 
rotic tendency, .g1 and .88; dominance, .8q and 
88; self-confidence, .86; sociability, .78. 

The test is published by Stanford University 
Press at Stanford, California. 

Ess-Ay Inventory. This test offers an over-all 
score purportedly indicative of persuasive abil- 
ity. It consists of 155 personality items and has 
an estimated odd-even reliability of .86. This 
inventory is published by the Personnel Insti- 
tute, New York, N.Y. 

Otis Self-Administering Test of Mental Abil- 
ity. This test has been employed extensively in 
educational as well as industrial organizations 
for many years. The 75 items, according to the 
author, measure verbal intelligence. The test 
may be administered for either 20- or 30-minute 
time periods. The longer administration time 
was used in this study, A test-retest reliability 
of .92 is reported by the test’s author in the 
manual. 
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Copies of the Otis may be obtained from the 
World Book Company, Yonkers, New York. 

Vocabulary Inventory. This test is a 100-item, 
multiple-choice, untimed form. The task re- 
quired of the examinee is to select from among 
four alternatives the one which most nearly 
means the same as the key word. The test has 
an estimated odd-even reliability of .gi. It is 
published by the Personnel Institute, New York, 
N.Y. 

Social Intelligence Test. This test is the spe- 
cial edition of the five-part test originally pub- 
lished by George Washington University. This 
short form, published by the Psychological Cor- 
poration, contains only two parts entitled “Judg- 
ment in Social Situations” and “Observation of 
Human Behavior.” Neither the manual pub- 
lished by George Washington University nor the 
one distributed by the Psychological Corpora- 
tion offers any reliability data. Howard R. Tay- 
lor, in his review in the Third Mental Measure- 
ments Yearbook, reports a test-retest reliability of 
89 for the five-section form. 

B-B-S Inventory. The content of this test is a 
duplicate of the General Clerical Test pub- 
lished by the Psychological Corporation, with the 
exception of the omissions of the sections on 
filing and location of numerical errors. A dif- 
ference also exists in administration procedure. 
In this test directions for the seven parts are 
read by the examinee during the time allotted 
for the completion of the section. The seven 
parts are timed individually. These parts consist 
of items covering the following: comparing and 
checking clerical copy, computation, reading, 
spelling, vocabulary, arithmetical problems, and 
grammar. The test-retest reliability for over-all 
score is .g4. It is published by the Personnel 
Institute, New York, N.Y. 

Mechanical Comprehension Test, Form AA. 
This test is a 60-item untimed form which uti- 
lizes a pictorial multiple-choice format to meas- 
ure comprehension of mechanical relationships. 
Form AA is the lowest level of difficulty of the 
various forms of the test. The author reports a 
coefficient of reliability of 84 obtained by the 
split-half method. The Mechanical Comprehen- 
sion Test is published by the Psychological Cor- 
poration, New York, N.Y. 

Test of Mechanical Ability. This test was de- 
veloped by Jack Hazlehurst at Northwestern Uni- 
versity in 1940 on the basis of a factorial study 
of paper-and-pencil tests of mechanical ability. 
The test is of the paper-and-pencil variety and 
has four sections which are administered with 
time limits. These four sections are: spatial per- 
ception, the ability to distinguish between 
lengths of lines; spatial visualization, the ability 
to determine the direction in which a bolt moves 
in a block; tool recognition, the ability to iden- 
tify common tools by name; and blueprint read- 


ing, the ability to indicate the lines missing in 
blueprint drawings. Reliability computed by the 
odd-even method is .82. This test is published 
by the Personnel Institute, Inc., New York, N.Y. 


The 39 criterion and predictor var- 
iables and their numerical designations 
as used in the study are: 


Over-All Average Ratings 
1. Most recent over-all rating including the 
evaluation by the plant superintendent, assistant 
plant superintendent, and personnel director. 
2. Previous year’s over-all rating including 
evaluations by plant superintendent, assistant 
plant superintendent, and personnel director. 


Personnel Director's Ratings 
g. Personnel director's rating for leadership. 
4. Personnel director's rating for knowledge. 
5. Personnel director's rating for dependabil- 
ity and judgment. 
6. Personnel director's rating for initiative and 
crealiveness. 
7. Personnel director's over-all rating. 


Plant Superintendent's Ratings 
8. Plant superintendent's rating for leadership. 
g. Plant superintendent's rating for knowledge. 
10, Plant superintendent's rating for depend- 
ability and judgment. 
11. Plant superintendent's rating for initia- 
tive and creativeness. 
12. Plant superintendent's over-all rating. 


Census Data 
13. Age. 
14. Years of education. 
15. Years with the company. 


The Personality Inventory 

6. Neurotic tendency. 

17. Dominance. 

18. Self-confidence. 

g. Sociability. 

20. Ess-Ay Inventory. 

21. Otis Self-Administering Test of 
Ability. 

22. Vocabulary Inventory. 


Social Intelligence Test 
23. Over-all score. 
24. Judgment in social situations. 
25. Observation of human behavior. 


B-B-S Inventory 
26. Over-all score. 
27. Comparing and checking clerical copy. 
28. Computation. 
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. Reading. 

. Spelling. 

. Vocabulary. 

. Arithmetical problems. 
. Grammar. 


- Mechanical Comprehension Test, Form 
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Test of Mechanical Ability 
. Over-all score. 
Spatial perception. 
. Spatial visualization. 
. Tool recognition. 
. Blueprint reading. 


IV. BRIEF SURVEY OF THE LITERATURE 


Considering the quantity of literature that an- 
nually is published on clerical workers, various 
professions, executives, and other specific job 
categories, there is a dearth of psychological 
research with respect to the important group of 
foremen, Ghiselli and Brown (10), reviewing the 
literature, found 185 instances of reports of the 
effectiveness of intelligence tests in the selection 
of employees in various occupations, only nine 
of which dealt with the selection of supervisors. 
After surveying the literature covering experi- 
mental studies of supervisory selection, Bou- 
langer (3) concluded that such investigations 
have been few in number. 

Many have written on the subject of the use 
of tests and their value with the broad group 
of supervisory personnel (1, 2, 4, 5, 6, 7, 8, 9, 
12, 1§, 24, 1%, 16, 27, 18, 19, 20, 22, 22, 29, 24, 


25, 26, 27, 28, 29, 31, $5). 

One of the most recent studies of foremen re- 
ported in the literature is by Sparks (30). This 
investigation is particularly pertinent to the pres- 
ent study. Sparks reported on the value of the 
Bernreuter Personality Inventory in differentiat- 
ing good from poor foremen in a large oil re- 
finery in the South. He found that scores from 
self-confidence and sociability keys did not cor- 
relate significantly with a criterion derived from 
rankings of senior supervisors and they did not 
differentiate good trom poor foremen. 

Utilizing the personality inventory, 
Hanawalt and Richardson (11) concluded that 
the chief difference between leaders and non- 
leaders is one of adjustment; the leaders are 
better adjusted in general than the nonleaders. 
The leaders are also more dominant. 


V. PROCEDURE 


The test data were collected over a 
period of three weeks. Groups of fore- 
men were tested by a trained test admin- 
istrator. The groups were formed on the 
basis of the individual’s ability to be 
spared from his job for a full day. All 
eight tests were completed in a single 
day with standard rest sessions between 
forms and an hour for lunch. The cri- 
terion data were completed prior to the 
collection of test data. These data and 
personnel data were obtained from com- 
pany personnel records. 

The first step in the analysis of data 
consisted of recording all test scores, 
other variables, and criterion data. The 
first analyses were completed on the rat- 
ings which served as the criterion var- 
iables. To determine the value of these 
ratings as a measure of the effectiveness 
of foremen, and assuming that normal 
distribution of ratings is desirable, a test 
of goodness of fit to the normal curve of 


over-all ratings was made. A frequency 
distribution of the ratings was computed 
and cumulative percentages plotted on 
probability paper. A normal curve is in- 
dicated when the center portion of the 
resultant curve is a straight line. This is 
an empirical, not a statistical, determi- 
nation of normality. As a check, the 
mean and standard deviation of the dis- 
tribution of ratings were computed, these 
points were plotted on the same _ prob- 
ability paper, and the line determined by 
these points was plotted. In a normal 
distribution, these two lines coincide. 
The reliability of the criterion was de- 
termined by correlating the combined 
over-all ratings of the judges for the pre- 
vious year with the combined over-all 
ratings of the same judges for the present 
period. This coefficient is to a degree a 
measure of test-retest reliability. How- 
ever, perfect reliability cannot be ex- 
pected even if the rating form were per- 
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fectly reliable, for the men must change 
somewhat in the course of a year. Inter- 
rater reliability coefficients were obtained 
by correlating the over-all ratings of the 
foremen prepared by the personnel di- 
rector and plant superintendent. Addi- 
tional measures of interrater reliability 
were obtained by correlating the scores 
of the plant superintendent for each of 
the four parts of the rating form with the 
corresponding scores of the other rater. 
‘This provided five measures of interrater 
reliability. 

A final criterion from among the rat- 
ings available was chosen and this was 
employed in computing a Wherry-Doo- 
little multiple regression equation that 


Criterion Variables 


A variety of statistical techniques was 
employed in order to determine the na- 
ture of the criterion measures. First it 
was desirable to know if the ratings pre- 
pared by the evaluators were normally 
distributed. The technique employed to 
determine this was the plotting of four 
cumulative percentages on normal prob- 
ability paper: 

1. Most recent over-all ratings by the 
personnel director, plant superintendent, 
and assistant plant superintendent com- 
bined. 

2. Over-all ratings for the previous 
year by the personnel director, plant su- 
perintendent, and assistant plant super- 
intendent combined. 

3. Most recent over-all ratings by the 
personnel director. 

4. Most recent over-all ratings by the 
plant superintendent. 

From the analysis of the plots, as de- 
scribed in the Procedure, it may be con- 
cluded that the criterion measures are for 
practical purposes normally distributed. 
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would maximally predict the criterion se- 
lected. The multiple R was computed us- 
ing the product-moment correlations 
whose computation is described below. 

Pearsonian product-moment correla- 
tions were computed among all criterion 
variables and all predictor variables. 
‘There are 12 criterion variables and 27 
predictor variables, a total of gg. All of 
the correlations were computed with the 
aid of “C-D Correlation Charts” which 
were devised by Cureton and Dunlap. 
While there are less time-consuming 
methods of coniputing r, this was se- 
lected because of the internal checks of- 
fered by correlation charts. 


The equivalent of test-retest reliability 
was determined by correlating the most 
recent over-all ratings with the previous 
year’s over-all ratings. The resultant r 
of .842 seems sufficiently high to justify 
confidence. Measures of interrater relia- 
bility appear in the correlation matrix 
shown later. ‘There are five reliability co- 
efficients in this matrix. ‘hese are corre. 
lations between variables 7 and 12, 4 
and 8, 4 and g, 5 and 10, and 6 and 11. 
The first is a measure of rater agreement 
with respect to over-all rating. The cor- 
relation was .629, also sufficiently high to 
justify confidence. The other four cor- 
relations give us interrater reliability for 
factors of leadership, knowledge, depend- 
ability and judgment, and initiative and 
creativeness in that order. ‘These correla- 
tions proved to be .539, .575, -593, and 
642. It is evident that there is at least 
some consensus of opinion concerning 
the effectiveness of the foremen in the 
various respective areas. 

A common limitation in multiple item 
ratings obtained in industry is that they 
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are in effect not composed of multiple 
factors, but are ratings of a single factor. 
In order to determine if the various 
parts included in the over-all ratings 
were independent or all measures of the 
same thing, a crude cluster analysis was 
made. This consisted of computing the 
mean and standard deviation of the in- 
tercorrelations of the criterion variables. 
A low mean and a large sigma would 
suggest the operation of more than one 
factor. A high mean and a standard de- 
viation small in comparison to the mean, 
a leptokurtic distribution, would suggest 
that all intercorrelations are equal to the 
mean value and the small fluctuations 
are probably due to chance or sampling 
errors. This would indicate that all vari- 
ables are measuring the same thing, the 
one factor probably explainable in terms 
of halo. 


TABLE 1 


Per Cent CONTRIBUTION OF EACH 
Part OF RATING 


Per Cent Contribution 


Per- Plant 
sonnel Super- 
Director intendent 
Leadership 9.86 
Knowledge - 12.74 
Dependabilityand 8.52 

judgment 
Initiative and cre- 
ativeness 


Variable 
Total 


12.73 
14.78 
12.56 


22.59 
27.52 
21.09 
28.80 


13.44 15.36 


44.57 100.00 


Total 


55-43 


An analysis was made in order to de- 
termine the percentage of contribution 
of each of the four parts of the ratings 
of the personnel director and plant sup- 
erintendent. ‘Table 1 shows the results of 
this analysis. It is readily noted that all 
parts contribute approximately the same 
amount to the total variance and thus 
there is no compelling reason to select 
from among them for the criterion to be 
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used in the computation of the multiple 
regression equation. 

Since we are dealing with variances, all 
of the parts could be added, thereby tak- 
ing advantage of the nonoverlap of vari- 
ances. Because the average of eight meas- 
ures of the same thing is probably more 
reliable than any one, the single criterion 
selected should be the total. However, 
what is probably an even more reliable 
criterion’ than the total of these two men 
is the combined rating of the personnel 
director, plant superintendent, and assist- 
ant plant superintendent. This combined 
rating as an over-all score was available 
to the researcher and was utilized as the 
criterion to determine the value of each 
of the predictors. This criterion measure 
is variable 1. 


Predictor Variables 


Having determined the nature of the 
rating which served as the criterion, the 
next step was to determine the nature of 
the intercorrelations among the person- 
ality characteristics, skills, and abilities 
and the various parts of the ratings. 
Tables 2, 3, and 4 present the results 
of correlating the test scores, age, years 
of education, years with the company, 
and criterion measures. 

If we reverse the correlations involving 
those factors that have a negative conno- 
tation, e.g., neurotic tendency, we find 
that practically all correlations are posi- 
tive. And this is to be expected. It is a 
common research finding that ability 
measures tend to be positively correlated; 
an individual skilled in one area tends 
to be competent in others. The more 
intelligent individuals tend to have bet- 
ter clerical skills, larger vocabularies, bet- 
ter mechanical comprehension, etc. (32, 
33» 

Each of the 27 predictor variables has 
face validity. The fact that these 27 vari- 
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ables and the 12 criterion measures are 
practically all positively correlated sug- 
gests that the criterion measures have va- 
lidity. 

An important consideration with re- 
spect to the correlation matrix is the 
faith that can be placed in each indi- 
vidual correlation. Fisher's z transforma- 
tion was used, indicating that coefficients 
-1g1 and above are significant at the 5% 
level. Those that are .224 and greater are 
significant at the 1%, level. Seven pre- 
dictor variables are significant at the 1% 
level while 13 of the 27 predictors, almost 
half, are significant at the 5% level. 

Age. In examining the first predictor 
variable, age, we find that it is negatively 
linearly correlated with the criterion. 
There is no indication of curvilinear re- 
lationship with the criterion. The corre- 
lation is —.200, significant at the 5% 
level. The size of this r may be in part 
due to the emphasis today in industry on 
youth. The younger worker is often 
looked upon favorably because of his 
youth, 

In this population there is a tendency 
for the younger man to be better edu- 
cated, which is to be expected in light 
of the rise in educational level through 
the years in this country. The younger 
men tend to score higher on the tests of 
knowledge, skill, and ability. Of the 20 
variables covering these factors, nine are 
significant at the 5% level, suggesting 
that in this population age is of signifi- 
cance in performance on these tests. 

The younger foreman’s responses on 
the Personality Inventory suggest a more 
extroverted, dominant, and confident 
person. However, of these three person- 
ality measures, only the correlation be- 
tween dominance and age reaches a level 
of significance that gives us some assur- 
ance that the r was not obtained by 
chance. The older men in this popula- 


coefficients .224 and greater are significant at the 1% level. 
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TABLE 3 


I 842 764 681 +744 
2 601 .606 
3 -738 
4 659 722 
5 

6 

7 

8 

9 

10 
Ir 


834 824 806 825 .877 
706 -785 812 834 826 
287 539 522 516 500 554 
883 .382 +407 406 
851 522.578 593 526 593 
893 581 -653 504 642 -639 
. 580 616 579 577 629 

744 860 847 933 

711 .873 

-778 -9O4 

-930 


* Correlation coefficients .191 and greater are significant at the 5% level. Correlation coefficients 


-224 and greater are significant at the 1% level. 


tion tend to be less persuasive and tend 
not to do as well as the younger foremen 
on the timed tests. There is a positive 
correlation between score on the vocab- 
ulary test, which is untimed, and the 
age factor. Clinical findings indicate that 
vocabulary tends to hold with age while 
skills of various sorts tend to diminish. 
This trend appears to be the same in this 
population, but it should be noted that 
the r does not reach the 5% level of sig- 
nificance. Nevertheless, the trend is defi- 
nitely suggestive, since this correlation is 
positive while the others are negative. 

Years of education. Years of education 
is a variable that is correlated positively 
with all test variables if we again reverse 
the BiN and FiC scores on the Bernreu- 
ter so that they have positive connota- 
tions. Years of education correlated high- 
est with scores on the mental ability test, 
which might well be expected for this 
population as well as for any other. Its 
correlation with the criterion is not suffi- 
ciently high, only .108, to expect that it 
could not have arisen by chance. The 
moderate positive correlation may be due 
to management's tendency to view the 
number of years of education of an em- 
ployee as a favorable factor. 

Years of education significantly 


highly correlated, 1% level, with all of 
the nonpersonality tests, except for Part 
III score on the Mechanical Ability Test. 
On the Bernreuter Personality Inventory 
the B4D score correlates with years of 
education to the extent that the r is sig- 
nificant well beyond the 1% level. It 
may be inferred then, at least for this 
population and these tests, that educa- 
tion is an important factor in success on 
tests of ability, skill, and knowledge. 
However, it may be of little or no im- 
portance in doing well as a factory fore- 
man. 

Years with the company. The number 
of years the foreman has been with the 
organization is of course closely asso- 
ciated with his age, as shown by the cor- 
relation of .526. For the most part, we 
find that the relationships between age 
and other predictor variables tend to be 
similar to the relationships between years 
with the firm and these same variables. 
However, there are some modifications. 

The older men who have been with 
the firm for some time tend to do better 
in the mechanical areas than those older 
men who have not been with the firm 
for as long a period of time. There ap- 
pears to be some relationship between 
the rating received from members of 
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management and the foreman’s length 
of service; those who are newer in the 
organization tend to be rated somewhat 
higher, but the correlation between this 
variable and the criterion is only —.109, 
which falls far short of the 5% level of 
significance. It should also be noted that 
the relationship between years with the 
firm and other variables tends to be mod- 
est. Only the r with grammar is signifi- 
cant at the 1% level. There are three 
correlations involving this variable that 
are significant at the 5°%, level in addi- 
tion to the one mentioned above. One of 
these three is positive. The longer the 
foreman is with the company, the more 
tools he seems to recognize. The men 
with longer tenure seem to have less 
knowledge of grammar, less persuasive 
ability, and poorer judgment in social 
situations. 

Neurotic tendency. Those in this popu- 
lation who show greater neurotic tend- 
ency as measured by the BiN key on the 
Bernreuter tend to be less proficient in 
mental ability, vocabulary, and persua- 
sive ability. The correlations with virtu- 
ally all variables are negative, including 
the r with the criterion. The correlation 
with the criterion, —.o0g8, might well have 
arisen by chance. 

Though the trend is in the direction 
of poor test scores for those with greater 
neurotic tendency, only one of the corre- 
lations is significant at the 1°% level and 
two additional at the 5% level. The 1's 
with vocabulary and Part I of the Social 
Intelligence Test reach the 5°, level of 
significance while the Ess-Ay Inventory 
score is significant at the 1% level. 

As for the correlations with other Bern- 
reuter factors, the findings for this popu- 
lation closely resemble those found in a 
group of 157 engineering students as re- 
ported in the manual for the inventory 
by Bernreuter. The correlations between 
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BiN scores and B4D, FiC, and F2S scores 
in the engineering student population 
were respectively —.80, .g5 and .32. In 
this population of factory foremen the 
correlations proved to be respectively 
874, and .285. In the foreman 
group as well as in Bernreuter’s student 
population, those showing greater neu- 
rotic tendency tended to be less aggres- 
sive, less confident, and more solitary or 
independent. 

Dominance. The correlations between 
dominance as measured by the B4D key 
of the Bernreuter and variables purport- 
ing to measure skills and abilities and 
knowledge in various areas are positive 
with the exception of two. Both these 
negative correlations are close to zero and 
do not reach an acceptable level of sig- 
nificance. Of the 20 correlations between 
B4D and tests of a nonpersonality na- 
ture, five are significant at the 1% level 
and three more are significant at the 5% 
level. At the 1% leve! are the variables of 
persuasive ability, mental ability, vocabu- 
lary, judgment in social situations, and 
understanding of human behavior. 

It appears that the more dominant 
foremen are those who are more capable 
in most of the areas measured by the 
tests employed. The importance of domi- 
nance in effectiveness as a factory fore- 
man is shown in the correlation of .200 
with the criterion. The r is significant at 
the 5% level but not at the 1% level. It 
appears to be a factor of some importance 
here. 

Self-confidence. The F1iC score is high- 
ly correlated with the BiN score in this 
population as it was in Bernreuter’s en- 
gineering student population. He re- 
ported a correlation coefficient between 
these two variables of .g5, whereas in this 
factory population the r is .874. Because 
of high correlation between these two 
variables, the correlations between other 
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variables and F1iC are very similar to the 
correlation coefficients between BiN and 
these same variables. In fact, the signs of 
the correlations are the same in all cases 
but one and the order of magnitude of 
the correlation coefficients is very similar. 
In the one instance of reversal of sign of 
r, both correlations are close to zero and 
might well have come about by chance. 

In the case of the BiN variable, the 
r’s with three tests were significant at the 
5% level and beyond. The same three 
scores are significantly highly correlated 
with FiC, Social Intelligence Test Part I, 
vocabulary and persuasive ability. The r 
between score for grammar and FiC 
reaches the 5% level of significance 
though it falls short of this level in the 
case of the BiN variable. Even the cor- 
relations with the criterion are similar, 
neither being sufficiently high to justify 
confidence in the correlation’s not having 
come about through chance. 

Soctability. The Bernreuter F2S factor 
is comparatively unrelated to the other 
Personality Inventory scores in this popu- 
lation. This was also true in the popula- 
tion reported on by the test author. Cor- 
relations between F2S and BiN, B4D and 
FiC in Bernreuter’s student population 
were .32, .07, and .11 respectively. In this 
group of foremen the correlations in the 
same order are .285, .006, and .188. 

The sociability factor does not appear 
to be of significant importance in success 
in the job of factory foreman and this 
variable appears to be unrelated to any 
significant extent with any of the other 
predictor variables except years of edu- 
cation and Part II and over-all score on 
the Social Intelligence Test. The r with 
education is .21, significant beyond the 
1% level, suggesting that in this popula- 
tion the men with more education tend 
to be less sociable or more solitary and 
independent. The high-scoring indi- 


viduals, those who tend to describe them- 
selves as nonsocial, tend to demonstrate 
better observation of human behavior as 
measured by the Social Intelligence Test. 

Ess-Ay Inventory. Score on the Ess-Ay 
Inventory is the first predictor variable 
to be correlated very significantly, at the 
1% level, with the chosen criterion. It 
can be expected to contribute to the mul- 
tiple regression equation, and does, as 
will be shown later. Because of the simi- 
larity in items, it can be expected that 
there would be some relationship  be- 
tween scores on this inventory and those 
obtained on the Personality Inventory. 
This is the case for three of the four 
scores obtained on the Bernreuter. 

Correlations between persuasive ability 
and FiC, B4D, and BiN are significant 
at the 1% level. Those who score higher 
on the Ess-Ay Inventory tend to be less 
neurotically inclined, more dominant, 
and more confident. The relatively high 
linear relationship between dominance 
score and persuasive ability suggests that 
the Ess-Ay Inventory will tend to be re- 
lated to other variables much the same 
as the B4D score. This is the case; the 
correlations are approximately of the 
same order of magnitude, and in all cases 
but one the signs are the same. 

Mental ability. Mental ability asaneas- 
ured by the Otis Self-Administering ‘Test 
of Mental Ability is the variable that is 
most highly correlated with the criterion. 
The r is .290; and of course it makes the 
greatest contribution to the multiple R. 
Mental ability in this population is cor- 
related positively with all factors of abil- 
ity, skill, and knowledge. In fact, the 
Pearsonian r’s between this test score and 
all other test scores of a nonpersonality 
nature are the highest in the matrix. No 
other test score correlates as highly with 
other test scores as does the raw score on 
the test of mental ability. Those foremen 
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who do well on this test tend to do well 
on all others, whether timed or untimed. 
In this population, performance on the 
Otis appears to cover a considerable 
amount of what is measured by the other 
tests. 

Vocabulary. It is a common research 
finding that knowledge of words cor- 
relates positively with measures of intel- 
lectual ability. The finding is similar 
within this group of 107 factory foremen. 
The correlation between scores on the 
100-item vocabulary test and the Otis is 
.620, one of the highest in the matrix. 
Therefore, it is to be expected that vo- 
cabulary scores would correlate posi- 
tively and highly with other test scores, 
as does the score on the Otis. However, 
the correlation coefficients are lower than 
the r’s between Otis and the other test 
scores in all cases except four. 

The 100-item vocabulary test, variable 
22, corréiates .645 with the g5-item timed 
vocabulary test which is part of the B-B-S 
Inventory, while the correlation between 
Otis scores and the 35-item vocabulary 
test, variable 1, is .603. Three of the 
four cases in which the vocabulary score 
is higher in its correlation with other test 
scores involve the Social Intelligence 
‘Test. However, the differences are modest 
at best, being .004, .oog, and .oo6. If a 
generalization may be made from this 
population, it seems that the measures of 
“Judgment in Social Situations” and 
“Observation of Human Behavior’ are 
to a large degree measures of verbal in- 
telligence and verbal knowledge. 

Vocabulary is positively correlated 
with the criterion but not to the extent 
that confidence can be placed in the 1's 
not having arisen by chance. Because of 
this and because vocabulary is so highly 
correlated with mental ability, which is 
most highly correlated with the criterion, 
it cannot be expected to be a valuable 
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contributor td the prediction of effective- 
ness for this group in a battery of tests 
that includes the Otis. 

Social Intelligence Test. Because the 
various scores obtained from the special 
edition of the Social Intelligence Test are 
highly correlated with scores on the Otis, 
it too cannot be expected to make an im- 
portant contribution to the prediction of 
effectiveness as a factory foreman. 

Part I of the Social Intelligence Test 
is highly correlated with Part II in this 
population, the r being .568. Part I is 
also highly correlated with the over-all 
score, the r being .731. However, it is Part 
II that contributes even greater variance 
to the over-all score. The correlation be- 
tween Part II and over-all score on the 
Social Intelligence Test in this factory 
group is .975, the highest r in the correla- 
tion matrix. The r between Part I and 
the criterion reaches the 5% level of sig- 
nificance, but neither of the other two 
scores correlates sufficiently highly with 
the criterion for us to place confidence 
in the suggested positive relationship. 

The over-all score on the Social Intel- 
ligence ‘Test correlates positively with all 
other test scores of a nonpersonality na- 
ture, helping to demonstrate either that 
the foremen who are competent in one 
area tend to be competent in others, or 
that the tests to some extent are actually 
measuring the same thing, or a combina- 
tion of these conclusions. 

B-B-S Inventory. The pattern of posi- 
tive correlations between test scores and 
the criterion and among the test scores is 
evident also in the relationships between 
the total clerical test score and criterion 
and between this test score and the other 
predictors. This is also true of the part 
scores on the B-B-S Inventory. The over- 
all score correlates sufficiently highly with 
the criterion to reach the 1% level of sig- 
nificance, suggesting that this total score 
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or one or more of the parts may have 
value in predicting the criterion. 

The average correlation between part 
score and total score on the B-B-S Inven- 
tory is .709, which is spuriously high be- 
cause each part score is included in the 
total. However, it appears that in this 
population a clerical test consisting of 
fewer sections would probably do an ade- 
quate job. The average correlation with 
intelligence test score is .,11. Part VI, 
consisting of arithmetical problems, vari- 
able 32, correlates lowest with the Otis 
score and next to the lowest with the 
total B-B-S score. Because of its compar- 
ative “independence” and because this 
variable correlates to a significant extent 
with the criterion, it proves valuable in 
helping to predict that measure of effec- 
tiveness. 

Mechanical Comprehension Test, 
Form AA. The next test in the battery 
is Bennett’s Mechanical Comprehension 
Test, Form AA. From the standpoint of 
face validity it would be expected that 
performance on this test would correlate 
higher with the criterion than most other 
tests in the battery. Actually, the correla- 
tion between Mechanical Comprehen- 
sion Test scores and the most recent over- 
all rating chosen as the criterion is .033 
points above the mean correlation be- 
tween the 24 test predictor variables and 
the same criterion; the average correla- 
tion between test scores and the criterion 
being .181. Further, it should be noted 
that the correlation between test scores 
on the Bennett and the previous over-all 
rating is .032 points greater than the av- 
erage correlation with the criterion, 
which is .126. These differences are small 
but tend to be in the direction of higher 
than average correlations with the cri- 
terion. 

However, while the correlation be- 
tween this test score and the criterion is 


significant at the 5% level, the test does 
not seem to have anywhere near the pre- 
dictive value it apparently has in other 
situations, In the manual for the test, 
Bennett reports a correlatioh of .64 be- 
tween scores made by machine tool opera- 
tor trainees and a criterion similar to the 
one used in this study. Other research 
studies cited in the manual reveal corre- 
lation coefficients in factory populations 
of .52, .30, .41, and .55. Another limita- 
tion in the value of this instrument in 
the population of factory foremen in- 
cluded in this study is the fact that the 
correlation between mental ability score 
and performance on the Bennett Mechan- 
ical Comprehension Test is quite high, 
the Pearsonian r being .528. 

Bennett reports a much lower correla- 
tion between the Otis Self-Administering 
Test of Mental Ability and his own test 
with high school students as the popula- 
tion. He reports an r of .25. However, the 
r is higher, .45, in a population of intro- 
ductory engineering subjects course en- 
rollees. Bennett does not further de- 
scribe these populations and it may be 
that the 107 factory foremen represent a 
less attenuated group with respect to in- 
tellectual spread than the two reported 
by Bennett. This would explain the 
higher correlation with Otis in the group 
of foremen in this study. 

Test of Mechanical Ability. The Test 
of Mechanical Ability (Hazelhurst’s) on a 
face validity basis would also be expected 
to correlate more highly with the cri- 
terion than the other test scores. This is 
the case to a moderate extent for the 
over-all score on this test, as it is also 
with the Bennett Mechanical Compre- 
hension Test. 

The total score on the Test of Mechan- 
ical Ability correlates .265 with the most 
recent rating selected as the criterion and 
.174 with the previous rating. These are 
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.084 and .048 points higher than the re- 
spective mean correlations between other 
test scores and criterion measures. The 
over-all score on the Test of Mechanical 
Ability correlates highly with Bennett's 
Mechanical Comprehension ‘Test, the r 
being .648. The relationship between the 
Mechanical Ability Test and the intel- 
ligence test score is also high, suggesting 
that this test measures to a considerable 
extent functions similar to those meas- 
ured by the Otis. 

All the part scores on the Test of 
Mechanical Ability correlate highly with 
the over-all score, but again this is a 
spurious correlation in that the part 
score is included in the total. The part 
scores are highly correlated among them- 
selves, all to a degree greater than chance 
expectancy, suggesting that these are not 
actually independent factors in the fac- 
tor-analysis meaning of the term. 

The over-all score and the score for 
Part II of the Test of Mechanical Ability 
correlate with the criterion beyond the 
1% level. Therefore, they might be ex- 
pected to contribute to the multiple R, 
but do not because of the comparatively 
high order of correlations between these 
variables and the Otis scores, years of ed- 
ucation, and age, three variables that 
make a contribution to the prediction of 
the criterion. 

Wherry-Doolittle test selection meth- 
od. he final step in the statistical analy- 
sis of the data was the application of the 
Wherry-Doolittle test selection method 
for the computation of the multiple R. 
‘This is a method for the selection of a 
battery of predictors that will give the 
maximum shrunken multiple correlation 
with the criterion after a correction has 
been made for the chance error added by 
each test. The tests and other predictors 
are selected in the order of their contri- 
bution to the multiple. 
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As a rule, the increase in the multiple 
becomes less and less as additional varia- 
bles are included in the predictor team. 
Finally, the point is reached where the 
addition of another test adds more 
chance error than actual validity to the 
battery. Application of the Wherry 
shrinkage formula after the addition of 
each test will show when this point has 
been reached and no further additions 
are useful. 

This method will not hold if two pre- 
dictor variables have some part in com- 
mon. When this is the case, spurious 
correlations are introduced. In the in- 
stances of variables 23, 26, and 35 we 
have over-all scores in which two or more 
parts are included. ‘Therefore, in com- 
puting R by the Wherry-Doolittle test 
selection method, either these three var- 
iables or the 13 part scores must be 
omitted. The retention of the maximum 
number of predictor variables seemed de- 
sirable. Therefore, the three over-all 
scores were not included in the compu- 
tation of R. The results of this computa- 
tion indicate that the predictor variable 
that contributes most to the prediction 
of the criterion is the Otis score. The 
correlation of .2g0 between Otis score 
and the criterion is significant at the 1%, 
level. This variable was the first to be 
selected in the Wherry-Doolittle process. 

The prediction of the criterion is in- 
creased by .o43 when the Ess-Ay Inven- 
tory score is added. We see then that of 
the factors measured, intelligence ap- 
parently is the most important in success 
as a foreman. Secondly, there is a person- 
ality factor that here is labeled “per- 
suasive ability.” 

The third predictor that was found 
to contribute to R is variable 32, the 
Part VI (Arithmetical Problems) score on 
the clerical test. This section of the B-B-S 
Inventory contains arithmetical prob- 
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lems. With the inclusion of this variable 
R is raised from .333 to .360. 

Again .027 is added to the multiple 
regression correlation coefficient when 
years of education, as a predictor, is 
added. Age is the next contributor to the 
extent of .o11, raising the multiple R to 

The standard error of the multiple R 
is .o88 when it includes only the Otis. 
When all five variables are included the 
standard error is reduced to .081. 

The formula for the prediction of the 
criterion in terms of standard scores was: 

Criterion = .484 (Otis) + .172 (Ess-Ay In- 
ventory) + .167 (Arithmetical Problems) + .574 
(education in years) — .143 (age). 

If only the first test, the Otis, is uti- 
lized the standard error of estimate is 
2.192. By adding the next four predic- 
tors the standard error of estimate is re- 
duced to 2.102, a gain of but .og. As a 
result, though the multiple R becomes 
greater with the inclusion of additional 
variables there is actually little gain be- 
cause the standard error is not appre- 
ciably reduced. 

Assuming validity for the criterion 
measure, verbal intelligence is impor- 
tant primarily in predicting success and 
secondarily in predicting persuasive 
ability. This is reasonable on a face- 


validity basis. However, one would ex- 
pect that some technical aspect of factory 
work would also enter the picture. The 
multiple regression equation does not 
bear out this expectation, possibly be- 
cause: (a) technical knowledge and tech- 
nical ability as measured by the mechani- 
cal tests employed are actually not im- 
portant, or (b) we have not really meas- 
ured technical knowledge and _ ability 
with the two mechanical tests, or (c) these 
technical qualities are actually to a large 
degree mental ability. 

Beyond the Otis score, variables in- 
cluded in this battery of tests do not 
materially improve the prediction of the 
criterion. The lack of reduction in the 
standard error of estimate, in addition to 
other factors, suggests that management 
might well confine itself to a small num- 
ber of variables rather than administer- 
ing a comprehensive battery of tests 
when considering a man for the position 
of foreman. 

If the selection ratio is high, i.e., if a 
few foremen are to be selected out of a 
large number of applicants, the weights 
may be used to advantage. If the selection 
ratio is small, there will be a limited im- 
provement over chance selection because 
of the size of the standard error. 


VII. SUMMARY 


This study was designed to investigate 
the importance of certain personality 
characteristics, skills, and abilities in 
effectiveness as a factory foreman. It was 
the purpose of this study to determine 
the value of certain psychometric instru- 
ments and other predictor variables in 
predicting success in this job in the to- 
bacco company involved in this study. 

The first step was to determine the na- 
ture of the rating form used as the cri- 


terion, then to determine its value as 
such a yardstick. Its reliability could be 
determined in certain respects, but its 
validity could only be assumed because 
of the difficulty of assessing validity of 
such a measure. The reliability of the 
criterion measures proved sufficiently 
high to justify confidence in them. How- 
ever, a crude cluster analysis of the parts 
of the rating form suggested that sepa- 
rate factors were not measured by the 
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rating; the ratings appear to reflect con- 
siderable “halo effect.” 

Following this analysis, the linear re- 
lationships between each of the predic- 
tors and each of the criterion measures 
were determined. The nature of the in- 
terrelationships was expressed in terms 
of product-moment correlations. Using 
these data and the finally selected single 
criterion measure, the Wherry-Doolittle 
test selection method was utilized in or- 
der to maximally predict the rating, 
which consisted of the combined judg- 
ments of three qualified raters. 

It was found that five predictor var- 
iables contributed to the multiple R. 
The following sequence gives the order 
of their selection by the Wherry-Doolittle 
method, together with the multiple R as 
each variable is added: Otis score (.2g0), 
Ess-Ay Inventory score (.933), Arith- 
metical Problems score (.360), years of 
education (.387), and age (.398). Age was 
negatively correlated with the criterion. 


A formula for the prediction of the cri- 
terion based on these five predictors was 
worked out on the basis of standard 


scores. 

Of the 27 predictor variables, 13 were 
correlated significantly with the criterion, 
at the 5% level or better. Seven of these 
correlations reached the 1% level of sig- 
nificance. These seven variables in the 
order of their correlation with the 
chosen criterion are: 

1. Otis Self-Administering Test of 

Mental Ability. 
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Test of Mechanical Ability, over-all 
score. 

. B-B-S Inventory, over-all score. 
Test of Mechanical Ability, Part I, 
Spatial Visualization. 

. Ess-Ay Inventory. 

. B-B-S Inventory, Part VI, Arith- 
metical Problems. 

7. B-B-S Inventory, Part II, Computa- 

tion. 

Many of the tests that were more 
highly correlated with the criterion did 
not enter the multiple regression equa- 
tion, primarily because of a high corre- 
lation with intelligence, the first variable 
included. ‘The majority of variables in- 
cluded in the analysis did not appear to 
be of significant importance in the job 
performed by these factory foremen. And, 
as in similar research studies, there were 
few variables uncovered that contributed 
to the prediction of a criterion such as 
the one selected. 

It seems justified to conclude that in 
a situation such as the one described in 
this study, a large battery of tests cannot 
be justified even if the time is available. 
In this case the major contribution of 
.2g0 to the multiple R of .398 was made 
by a test purporting to measure verbal 
intelligence. Comparatively little of re- 
liable value was added by the other tests. 

Further research in this area should 
seek to find instruments that measure 
factors that can successfully predict the 
criterion despite their lack of correlation 
with intelligence. 
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